Scalable prediction failure analysis for memory used in modern computers

ABSTRACT

One embodiment provides a method for scalable predictive failure analysis. Embodiments of the method may include gathering memory information for memory on a user computer system having at least one processor. Further, the method includes selecting one or more memory-related parameters. Further still, the method includes calculating based on the gathering and the selecting, a single bit error value for the scalable predictive failure analysis through calculations for each of the one or more memory-related parameters that utilize the memory information. Yet further, the method includes setting, based on the calculating, the single bit error value for the user computer system.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.14/011,222, filed Aug. 27, 2013, which is a continuation of U.S. patentapplication Ser. No. 12/912,735, filed on Oct. 26, 2010.

BACKGROUND

Memory correctable errors are becoming a major issue in today's modernpersonal computers, especially since supported memory sizes often reachterabytes instead of gigabytes. To that end, complex predictive failureanalyses are desirous in order to anticipate and prevent mild tocatastrophic system failures involving data loss and damage due tomemory errors.

BRIEF SUMMARY

One embodiment provides a method for scalable predictive failureanalysis. Embodiments of the method may include gathering memoryinformation for memory on a user computer system having at least oneprocessor. Further, the method includes selecting one or morememory-related parameters from a plurality. Further still, the methodincludes calculating based on the gathering and the selecting, a singlebit error value for the scalable predictive failure analysis throughcalculations for each of the one or more memory-related parameters thatutilize the memory information. Yet further, the method includessetting, based on the calculating, the single bit error value for theuser computer system.

Another embodiment provides a computer program product for scalablepredictive failure analysis. The computer program product includes acomputer readable storage device. Further, the computer program productincludes first program instructions to gather memory information formemory on a user computer system having at least one processor. Furtherstill, the computer program product includes second program instructionsto select one or more memory-related parameters. Yet further, thecomputer program product includes third program instructions tocalculate based on the gather and the select (i.e., performing theinstructions to gather and to select), a single bit error value for thescalable predictive failure analysis through calculations for each ofthe one or more memory-related parameters that utilize the memoryinformation. Still further, the computer program product includes fourthprogram instructions to set, based on the calculate (i.e., performingthe instructions to calculate), the single bit error value for the usercomputer system, wherein the first, second, third, and fourth programinstructions are stored on the computer readable storage device.

Another embodiment provides a system for scalable predictive failureanalysis. The system includes a processor, a computer readable memoryand a computer readable storage device. Further, the system includesfirst program instructions to gather memory information for memory on auser computer system having at least one processor, wherein the memorymay be the same, part of or different from the computer readable memory.Further still, the system includes second program instructions to selectone or more memory-related parameters. Yet further, the system includesthird program instructions to calculate, based on the gather and theselect, a single bit error value for the scalable predictive failureanalysis through calculations for each of the one or more memory-relatedparameters that utilize the memory information. Further still, thesystem includes fourth program instructions to select, based on thecalculate, the single bit error value for the user computer system. Thefirst, second, third, and fourth program instructions of the system arestored on the computer readable storage device for execution by theprocessor via the computer readable memory.

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

So that the manner in which the above recited features, advantages andobjects of the present disclosure are attained and can be understood indetail, a more particular description of this disclosure, brieflysummarized above, may be had by reference to the embodiments thereofwhich are illustrated in the appended drawings.

It is to be noted, however, that the appended drawings illustrate onlyexample embodiments of this disclosure, and, therefore, are not to beconsidered limiting of its scope, for this disclosure may admit or notto other equally effective embodiments.

FIG. 1 depicts an example embodiment of a system for scalable predictivefailure analysis in accordance with this disclosure.

FIG. 2 depicts a block diagram of an example embodiment of a computersystem suitable for scalable predictive failure analysis, such as a usercomputer system.

FIG. 3 depicts an example embodiment of a flowchart to show a method forscalable predictive failure analysis in accordance with this disclosure.

FIG. 4 depicts another diagram of an example embodiment of a computersystem suitable for scalable predictive failure analysis, such as a usercomputer system.

DETAILED DESCRIPTION

The following is a detailed description of example embodiments withaccompanying drawings. The example embodiments are in such detail as tocommunicate the invention. However, the amount of detail offered is notintended to limit the anticipated variations of embodiments; on thecontrary, the intention is to cover all modifications, equivalents, andalternatives falling within the spirit and scope of the presentinvention as defined by the appended claims.

Generally speaking, systems, methods and media for scalable predictivefailure analysis (SPFA) for single bit errors (SBE) in memory aredisclosed. Embodiments include gathering, for a user computer system,memory information, such as memory size, synchronous dynamic randomaccess memory (SDRAM) technology on the module, module packaging, memoryfailure mode and vendor quality. Calculation of the SBE value ensuesthrough combining calculation(s) for each of the selected memory-relatedparameters, wherein the selecting optionally occurs subsequent or priorto the gathering. The calculated SBE value is set and valid for the usercomputer system until powering down or changing memory components in theuser computer system. Accordingly, the SBE value is scalable because thevalue is determined for the particular user computer system—not simply afixed, generic value. Alerts, whether audible or visible, may occurbased on comparing counted SBEs to the scalable SBE value. The alertsprovide credible predictive failure analysis to avert system memoryfailures while incorporating the realities of the unique complexitiesfor the particular user computer system.

In general, the routines executed to implement the embodiments of theinvention may be part of a specific application, component, program,module, object, or sequence of instructions. The computer program of thepresent invention typically is comprised of a multitude of instructionsthat will be translated by the native computer into a machine-readableformat and hence executable instructions. Also, programs are comprisedof variables and data structures that either reside locally to theprogram or are found in memory or on storage devices. In addition,various programs described herein may be identified based upon theapplication for which they are implemented in a specific embodiment ofthe invention. However, it should be appreciated that any particularprogram nomenclature herein is used merely for convenience, and thus theinvention should not be limited to use solely in any specificapplication identified and/or implied by such nomenclature.

While specific embodiments will be described below with reference toparticular configurations of hardware and/or software, those of skill inthe art will realize that embodiments of the present invention mayadvantageously be implemented with other substantially equivalenthardware, software systems, manual operations, or any combination of anyor all of these. The invention can take the form of an entirely hardwareembodiment, an entirely software embodiment or an embodiment containingboth hardware and software elements. In a preferred embodiment, theinvention is implemented in software, which includes but is not limitedto firmware, resident software, microcode, etc. Moreover, embodiments ofthe invention may also be implemented via parallel processing using aparallel computing architecture, such as one using multiple discretesystems (e.g., plurality of computers, etc.) or an internalmultiprocessing architecture (e.g., a single system with parallelprocessing capabilities).

Any combination of one or more computer readable medium(s) may beutilized. The computer readable medium may be a computer readable signalmedium or a computer readable storage medium. A computer readablestorage medium may be, for example, but not limited to, an electronic,magnetic, optical, electromagnetic, infrared, or semiconductor system,apparatus, or device, or any suitable combination of the foregoing. Morespecific examples (a non-exhaustive list) of the computer readablestorage medium would include the following: an electrical connectionhaving one or more wires, a portable computer diskette, a hard disk, arandom access memory (RAM), a read-only memory (ROM), an erasableprogrammable read-only memory (EPROM or Flash memory), an optical fiber,a portable compact disc read-only memory (CD-ROM), an optical storagedevice, a magnetic storage device, or any suitable combination of theforegoing. In the context of this document, a computer readable storagemedium may be any tangible medium that can contain, or store a programfor use by or in connection with an instruction execution system,apparatus, or device.

A computer readable signal medium may include a propagated data signalwith computer readable program code embodied therein, for example, inbaseband or as part of a carrier wave. Such a propagated signal may takeany of a variety of forms, including, but not limited to,electro-magnetic, optical, or any suitable combination thereof. Acomputer readable signal medium may be any computer readable medium thatis not a computer readable storage medium and that can communicate,propagate, or transport a program for use by or in connection with aninstruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmittedusing any appropriate medium, including but not limited to wireless,wireline, optical fiber cable, RF, etc., or any suitable combination ofthe foregoing.

Computer program code for carrying out operations for aspects of thepresent invention may be written in any combination of one or moreprogramming languages, including an object oriented programming languagesuch as Java, Smalltalk, C++ or the like and conventional proceduralprogramming languages, such as the “C” programming language or similarprogramming languages. The program code may execute entirely on theuser's computer, partly on the user's computer, as a stand-alonesoftware package, partly on the user's computer and partly on a remotecomputer or entirely on the remote computer or server. In the latterscenario, the remote computer may be connected to the user's computerthrough any type of network, including a local area network (LAN) or awide area network (WAN), or the connection may be made to an externalcomputer (for example, through the Internet using an Internet ServiceProvider).

Aspects of embodiments of the invention described herein may be storedor distributed on computer-readable medium as well as distributedelectronically over the Internet or over other networks, includingwireless networks. Data structures and transmission of data (includingwireless transmission) particular to aspects of the invention are alsoencompassed within the scope of the invention. Furthermore, theinvention can take the form of a computer program product accessiblefrom a computer-readable medium providing program code for use by or inconnection with a computer or any instruction execution system. For thepurposes of this description, a computer-usable or computer readablemedium can be any apparatus that can contain, store, communicate,propagate, or transport the program for use by or in connection with theinstruction execution system, apparatus, or device. The medium may be anelectronic, magnetic, optical, electromagnetic, infrared, orsemiconductor system (or apparatus or device) or a propagation medium.Examples of a computer-readable medium include a semiconductor or solidstate memory, magnetic tape, a removable computer diskette, a randomaccess memory (RAM), a read-only memory (ROM), a rigid magnetic disk andan optical disk. Current examples of optical disks include compactdisk-read only memory (CD-ROM), compact disk-read/write (CD-R/W) andDVD.

Each software program described herein may be operated on any type ofdata processing system, such as a personal computer, server, etc. A dataprocessing system suitable for storing and/or executing program code mayinclude at least one processor coupled directly or indirectly to memoryelements through a system bus. The memory elements may include localmemory employed during execution of the program code, bulk storage, andcache memories which provide temporary storage of at least some programcode in order to reduce the number of times code must be retrieved frombulk storage during execution. Input/output (I/O) devices (including butnot limited to keyboards, displays, pointing devices, etc.) may becoupled to the system either directly or through intervening I/Ocontrollers. Network adapters may also be coupled to the system toenable the data processing system to become coupled to other dataprocessing systems or remote printers or storage devices throughintervening private or public networks, including wireless networks.Modems, cable modems and Ethernet cards are just a few of the currentlyavailable types of network adapters.

Turning now to the drawings, FIG. 1 depicts a user computer system 100having a collection of cooperating, algorithmic modules for SPFAcalculations. The enabling logic for modules 110, 115, 120, 130, 140,145 is reduced to software and/or hardware. The modules 110, 115, 120,130, 140, 145, are located, for example, within the operating system ofa user computer system 100. In alternative example embodiments, any ofthe modules 110, 115, 120, 130, 140, 145 may be located remotely but innetwork communication with the user computer system 100. Example ofremote location may have some of the modules 110, 115, 120, 130, 140,145 located on other computer systems, including manipulations andcalculations of the generated data being the subject of a Web service.

Regardless of individual logic location, the system 100 has accessiblelogic to gather memory information for memory 105 on the user computersystem 100. The gathering module 110 gathers memory information, memorysize, synchronous dynamic random access memory (SDRAM) technology on themodule, module packaging, memory failure mode and vendor quality formemory 105 under test on the particular user computer system 100. Forexample, memory information for memory 105 could be a module size of 2GB for a single-rank dual in-line module (DIMM). Below, furtherdiscussion of memory information occurs in combination with discussionof selected memory-based parameters.

The system 100 also includes logic, denominated as a configurationmodule 120 in FIG. 1, for selecting one or more memory-relatedparameters from a plurality of such parameters. A user or administrator,for example, of the user computer system 100 selects whichmemory-related parameters to include in the SPFA calculations. Theselecting may occur through textual entry, radial selection, or othermethod for selecting options through a display coupled to the usercomputer system 100. The selected memory-related parameters, themselves,directly correlate to memory information. That is, memory informationregarding memory size correlates to the memory-related parameter formemory size, memory information regarding module packaging correlates tothe memory-related parameter for module packaging, and so forth.

In communication with both the gathering and configuration modules 110,120, the calculation module 130 includes logic to calculate acombination of the selected memory-related parameters. The SPFA uses theselected number of memory-related parameters, which one considerscritical to maintain a functioning memory subsystem, in order tocalculate the SBE value. The setting module 140 then sets the calculatedSBE value for the system 100. Evaluation of exemplary memory-relatedparameters and combination of the same for calculation of the SBE valuenow ensues.

Memory module size is a memory-related parameter for possible inclusionin the SPFA calculation for the memory 105. For such, the followingexemplary scale is provided for a correctable SBE value based on theactual capacity of each module or module-pairs installed in the system:

TABLE 1 Module Size Scale Factor (n) PFA threshold in time window  2 GB1  x  4 GB 2 2x  8 GB 4 4x 16 GB 8 8x 32 GB 16 16x Referring to Table 1, and assuming x=256 SBE for a baseline PFA countwithin a 24-hour window, then a larger memory 105 DIMM logically permitsmore SBEs before meeting or exceeding a set SBE value, i.e., athreshold. For example, the memory-based parameter for memory modulesize would allow 256 SBEs for a 2 GB DIMM, 512 SBEs for a 4 GB DIMM,1024 SBEs for a 8 GB DIMM, 2048 SBEs for a 16 GB DIMM, and 4096 SBEs fora 32 GB DIMM before memory failure realized by visual and/or audio alertthrough use of the detection and comparison modules 115, 145.

In addition to memory module size, another possibly selectedmemory-related parameter for inclusion in the calculation of the SBEvalue is SDRAM technology on the memory module 105. For such, thefollowing exemplary scale is provided:

TABLE 2 Number of Rank Scale Factor (m) PFA threshold in time window 1(Single) 1 y 2 (Dual) 1.2 y/1.2 4 (Quad) 1.6 y/1.6Referring to Table 2, and assuming y=1024 for a baseline PFA countwithin a 24-hour window, memory 105 DIMM with a lesser rank permits ahigher SBE value. For example, the memory-based parameter for SDRAMtechnology would allow 1024 SBEs for a single-rank DIMM, 823 SBEs for adual-rank DIMM, and 640 SBEs for a quad-rank DIMM before alerting theuser or another system in network communication with the system 100 ofmemory failure of a module or other memory device needing repair orreplacement, whereupon the latter at least suggests a new SBE valueshould be re-set by re-calculation.

Still another memory-related parameter for inclusion in the calculationof the SBE value is module packaging of the memory 105 on the particularuser computer system 100. For such, the following exemplary scale isprovided:

TABLE 3 PFA threshold in time SDRAM Data Width Scale Factor (k) windowx8 (with no IBM ® 1  z Chipkill ™ technology support) x8 (with IBM ® 22z Chipkill ™ support) x4 (with IBM ® 2.5   2.5z Chipkill ™ support)IBM® Chipkill™ is an advanced error checking and correcting (ECC)computer technology that has the ability to correct multi-bit memoryerrors on a single SDRAM. Referring to Table 3, and assuming z=256 for abaseline PFA count within a 24-hour window, memory 105 DIMM withadditional advanced ECC protection, i.e., Chipkill™, affords a higherSBE value due to this individual PFA metric. For example, thememory-based parameter regarding Chipkill™ would allow 256 SBEs for ×8DIMM with no Chipkill™, 512 SBEs for ×8 DIMM with Chipkill™ is, and 640SBEs for ×4 DIMM with Chipkill™

Yet another memory-related parameter for optional inclusion in thecalculation of the SBE value is memory failure mode of the memory 105 onthe particular user computer system 100. Here, this memory-relatedparameter regards single count reduction for a single memory address.That is, a correctable SBE that occurs repeatedly at the same memoryaddress on memory 105 DIMM is counted as one failure instead of countingthe repeats as multiple failures.

Another example of a memory-related parameter for optional inclusion inthe calculation of the SBE value is vendor quality of the memory 105 onthe particular user computer system 100. For such, the followingexemplary scale is provided:

TABLE 4 Number of Rank Scale Factor (m) Vendor A, Product 1 1 Vendor A,Product 2 0.8 Vendor B, Product 1 1 Vendor C, Product 1 0.5Table 4 represents a memory vendor quality/reliability matrix on a perproduct basis. A memory vendor can have multiple products, each onecould have a different quality/reliability rating. The quality scalerating, such as Table 4, may be used for calculating the SBE value. Amemory 105 DIMM from a lower quality score supplier yields a lower PFAthreshold value for this memory-related parameter. A lower quality scorewould require replacement or repair sooner as compared to a higherquality score provided all other contributing PFA memory-relatedparameters to the SBE value are constant.

For calculation purposes, combination of the selected, memory-relatedparameters may be through simple addition, multiplication, a mixture ofthe two, or any other combination method so as to yield a reliable,relative, and meaningful SBE value for SFPA. For example, the foregoingfive memory-related parameters may calculate an SBE value according to:PFA_((sum))=PFA_((a))+PFA_((b))+PFA_((c))+PFA_((d))+PFA_((e)). The valueof each memory-related PFA threshold and time window(s) should bedefined by the subject matter expert on the system design team. That is,the illustrative tables provided herein are neither the sole nornecessarily appropriate values to use because the same are solelyintended as examples. Whether a hardware built-in memory test, power-onmemory test (i.e., post-power on self-test), system in run time, ormemory diagnostic test, this disclosure enables a selectable andscalable PFA for memory 105 that thwarts consequences of memory failuresfor a particular user computer system 100.

FIG. 2 depicts a block diagram of one embodiment of a computer system200 suitable for use in scalable predictive failure analysis. Otherpossibilities for the computer system 200 are possible, including acomputer having capabilities other than those ascribed herein andpossibly beyond those capabilities, and they may, in other embodiments,be any combination of processing devices such as workstations, servers,mainframe computers, notebook or laptop computers, desktop computers,PDAs, mobile phones, wireless devices, set-top boxes, or the like. Atleast certain of the components of computer system 200 may be mounted ona multi-layer planar or motherboard (which may itself be mounted on thechassis) to provide a means for electrically interconnecting thecomponents of the computer system 200.

In the depicted embodiment, the computer system 200 includes a processor202, storage 204, memory 206, a user interface adapter 208, and adisplay adapter 210 connected to a bus 212 or other interconnect. Thebus 212 facilitates communication between the processor 202 and othercomponents of the computer system 200, as well as communication betweencomponents. Processor 202 may include one or more system centralprocessing units (CPUs) or processors to execute instructions, such asan IBM® PowerPC® processor, an Intel® Pentium® processor, an AdvancedMicro Devices, Inc. processor or any other suitable processor. IBM andPowerPC are trademarks of International Business Machines Corporation,registered in many jurisdictions worldwide. Intel and Pentium aretrademarks or registered trademarks of Intel Corporation or itssubsidiaries in the United States or other countries. The processor 202may utilize storage 204, which may be non-volatile storage such as oneor more hard drives, tape drives, diskette drives, CD-ROM drive, DVD-ROMdrive, or the like. The processor 202 may also be connected to memory206 via bus 212, such as via a memory controller hub (MCH). Systemmemory 206 may include volatile memory such as random access memory(RAM) or double data rate (DDR) synchronous dynamic random access memory(SDRAM). In the disclosed systems, for example, a processor 202 mayexecute instructions to perform functions, such as by gathering memoryinformation and selecting memory-related parameters for inclusion forSPFA calculations. Information before, during or after calculations maytemporarily or permanently be stored in storage 204 or memory 206.

Turning now to FIG. 3, another aspect of scalable predictive failureanalysis for memory associated with a particular user computer system isdisclosed. At point is an example embodiment of a flowchart 300 forimproved predictive failure analysis after having set the SBE value forthe user computer system. Flowchart 300 is for a system, such as system100, notably involving the logic associated with the detection andcomparison modules 115, 145 of FIG. 1.

Returning to FIG. 3, flowchart 300 starts 305 by the system detecting310 SBEs on a DIMM via a system management interrupt (SMI). When theuser computer system boots, the BIOS or other BIOS implementation, suchas Unified Extensible Firmware Interface (UEFI), interrupt factors areestablished. Upon the memory controller detecting 310 a SBE, SMI istriggered to notify wake up BIOS to check 320 the memory-relatedparameters and SBE counts accumulated so far. Decision block 330 querieswhether the SBE count value is at least equal to set SBE value. If yes340, then the flowchart 300 issues 350 an SPFA alert and optionallyprovides repair actions, such as displaying a visual notice to replacethe specific faulty memory module or suggests reparative procedures. Ifno 335, then the flowchart 300 returns to sleep, at least until the nextSBE is counted, because comparison of the counted SBEs for theparticular user computer system is less than the set SBE value.Subsequent to the issuing 350 the alert with optional actions or no 335,the flowchart ends 375.

FIG. 4 illustrates information handling system 401 which is a simplifiedexample of a computer system, such as shown in FIG. 2 for use inscalable predictive failure analysis, and capable of performing theoperations described herein. Computer system 401 includes processor 400which is coupled to host bus 405. A level two (L2) cache memory 410 isalso coupled to the host bus 405. Host-to-PCI bridge 415 is coupled tomain memory 420, includes cache memory and main memory controlfunctions, and provides bus control to handle transfers among PCI bus425, processor 400, L2 cache 410, main memory 420, and host bus 405. Asan alternative to the foregoing, the level 2 cache 410, memorycontroller and the north bridge may be integrated into the CPU; then,the system main memory is connected to the memory controller, which isinside the CPU. PCI bus 425 provides an interface for a variety ofdevices including, for example, LAN card 430. PCI-to-ISA bridge 435provides bus control to handle transfers between PCI bus 425 and ISA bus440, universal serial bus (USB) functionality 445, IDE devicefunctionality 450, power management functionality 455, and can includeother functional elements not shown, such as a real-time clock (RTC),DMA control, interrupt support, and system management bus support.Peripheral devices and input/output (I/O) devices can be attached tovarious interfaces 460 (e.g., parallel interface 462, serial interface464, infrared (IR) interface 466, keyboard interface 468, mouseinterface 470, fixed disk (HDD) 472, removable storage device 474)coupled to ISA bus 440. Alternatively, many I/O devices can beaccommodated by a super I/O controller (not shown) attached to ISA bus440.

BIOS 480 is coupled to ISA bus 440, and incorporates the necessaryprocessor executable code for a variety of low-level system functionsand system boot functions. BIOS 480 can be stored in any computerreadable medium, including magnetic storage media, optical storagemedia, flash memory, random access memory, read only memory, andcommunications media conveying signals encoding the instructions (e.g.,signals from a network). In order to attach computer system 401 toanother computer system to copy files over a network, LAN card 430 iscoupled to PCI bus 425 and to PCI-to-ISA bridge 435. Similarly, toconnect computer system 401 to an ISP to connect to the Internet using atelephone line connection, modem 475 is connected to serial port 464 andPCI-to-ISA Bridge 435.

While the computer systems described in FIGS. 2 and 4 are capable ofexecuting the disclosure described herein, these computer systems aresimply examples of computer systems and user computer systems. Thoseskilled in the art will appreciate that many other computer systemdesigns are capable of performing the disclosure described herein.

Another embodiment of the disclosure is implemented as a program productfor use within a device such as, for example, those systems and methodsdepicted in FIGS. 1 and 3. The program(s) of the program product definesfunctions of the embodiments (including the methods described herein)and can be contained on a variety of media including but not limited to:(i) information permanently stored on non-volatile storage-typeaccessible media (e.g., write and readable as well as read-only memorydevices within a computer such as ROM, flash memory, CD-ROM disksreadable by a CD-ROM drive); (ii) alterable information stored onwritable storage-type accessible media (e.g., readable floppy diskswithin a diskette drive or hard-disk drive); and (iii) informationconveyed to a computer through a network. The latter embodimentspecifically includes information downloaded onto either permanent oreven sheer momentary storage-type accessible media from the World WideWeb, an internet, and/or other networks, such as those known, discussedand/or explicitly referred to herein. Such data-bearing media, whencarrying computer-readable instructions that direct the functions of thepresent disclosure, represent embodiments of the present disclosure.

In general, the routines executed to implement the embodiments of thisdisclosure, may be part of an operating system or a specificapplication, component, program, module, object, or sequence ofinstructions. The computer program of this disclosure typicallycomprises a multitude of instructions that will be translated by thenative computer into a machine-readable format and hence executableinstructions. Also, programs are comprised of variables and datastructures that either reside locally to the program or are found inmemory or on storage devices. In addition, various programs describedhereinafter may be identified based upon the application for which theyare implemented in a specific embodiment of this disclosure. However, itshould be appreciated that any particular program nomenclature thatfollows is used merely for convenience, and thus this disclosure shouldnot be limited to use solely in any specific application identifiedand/or implied by such nomenclature.

While the foregoing is directed to example embodiments of thisdisclosure, other and further embodiments of this disclosure may bedevised without departing from the basic scope thereof, and the scopethereof is determined by the claims that follow.

What is claimed is:
 1. A method for scalable predictive failureanalysis, the method comprising: a processor gathering values for one ormore characteristics of a memory of a computer system; the processorselecting one or more memory-related parameters, each of the one or morememory-related parameters representing a different characteristic of theone or more characteristics of the memory, wherein the one or morememory-related parameters include a memory rank parameter; the processorcalculating, a scalable single bit error threshold value using thevalues for each of the one or more characteristics of the memoryrepresented by the selected one or more memory-related parameters; andthe processor setting the calculated scalable single bit error thresholdvalue for the memory.
 2. The method of claim 1, further comprising theprocessor re-calculating the scalable single bit error threshold valuefor the memory subsequent to performance of a reparative procedure. 3.The method of claim 1, further comprising causing display of a notice inresponse to a determination that the calculated scalable single biterror threshold value has been met or exceeded.
 4. The method of claim1, wherein the one or more memory-related parameters include a memorysize parameter indicative of a capacity of the memory.
 5. The method ofclaim 1, wherein the one or more memory-related parameters include amodule packaging parameter indicative of whether the memory includeserror checking and correcting (ECC) capability.
 6. The method of claim1, wherein the one or more memory-related parameters include a memoryfailure mode parameter indicative of how errors are counted for acorrectable single bit error that occurs repeatedly at the same addressof the memory.
 7. The method of claim 1, wherein the one or morememory-related parameters include a vendor quality rating parameter. 8.A computer program product for scalable predictive failure analysis, thecomputer program product comprising: a computer readable storage device;program instructions, stored on the computer readable storage device, togather values for one or more characteristics of a memory of a computersystem; program instructions, stored on the computer readable storagedevice, to select one or more memory-related parameters, each of the oneor more memory-related parameters representing a differentcharacteristic of the one or more characteristics of the memory, whereinthe one or more memory-related parameters include a memory rankparameter; program instructions, stored on the computer readable storagedevice, to calculate a scalable single bit error threshold value usingthe values for each of the one or more characteristics of the memoryrepresented by the selected one or more memory-related parameters; andprogram instructions, stored on the computer readable storage device, toset the calculated scalable single bit error threshold value for thememory.
 9. The computer program product of claim 8, wherein the one ormore memory-related parameters include a memory size parameterindicative of a capacity of the memory.
 10. The computer program productof claim 8, wherein the one or more memory-related parameters include amodule packaging parameter indicative of whether the memory includeserror checking and correcting (ECC) capability.
 11. The computer programproduct of claim 8, wherein the one or more memory-related parametersinclude a memory failure mode parameter indicative of how errors arecounted for a correctable single bit error that occurs repeatedly at thesame address of the memory.
 12. The computer program product of claim 8,wherein the one or more memory-related parameters include a vendorquality rating parameter.
 13. A system for scalable predictive failureanalysis, the system comprising: a processor, a computer readable memoryand a computer readable storage device; program instructions, stored onthe computer readable storage device for execution by the processor viathe computer readable memory, to gather values for one or morecharacteristics of a memory of a computer system; program instructions,stored on the computer readable storage device for execution by theprocessor via the computer readable memory, to select one or morememory-related parameters, each of the one or more memory-relatedparameters representing a different characteristic of the one or morecharacteristics of the memory, wherein the one or more memory-relatedparameters include a memory rank parameter; program instructions, storedon the computer readable storage device for execution by the processorvia the computer readable memory, to calculate a scalable single biterror threshold value using the values for each of the one or morecharacteristics of the memory represented by the selected one or morememory-related parameters; and program instructions, stored on thecomputer readable storage device for execution by the processor via thecomputer readable memory, to set the calculated scalable single biterror threshold value for the memory.
 14. The system of claim 13,wherein the one or more memory-related parameters include a memory sizeparameter indicative of a capacity of the memory.
 15. The system ofclaim 13, wherein the one or more memory-related parameters include amodule packaging parameter indicative of whether the memory includeserror checking and correcting (ECC) capability.
 16. The system of claim13, wherein the one or more memory-related parameters include a memoryfailure mode parameter indicative of how errors are counted for acorrectable single bit error that occurs repeatedly at the same addressof the memory.
 17. The system of claim 13, wherein the one or morememory-related parameters include a vendor quality rating parameter.